-
Couldn't load subscription status.
- Fork 1.9k
[KERNEL] Add e2e tests for writing tables with collated columns #5357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[KERNEL] Add e2e tests for writing tables with collated columns #5357
Conversation
53c0334 to
9e24ec3
Compare
c05daf9 to
13e75c4
Compare
f8839d0 to
e95e836
Compare
e95e836 to
bc298a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests and approach LGTM, just had some concerns about how we compare throughout the code-base
| return equals(dataType); | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having this + the other equivalent method or might they satisfy the same goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think equivalentIgnoreCollations is needed because of the cases like https://github.com/delta-io/delta/pull/5357/files#diff-c9fd0f3e881617ea0f2439a29c32a35e8c32fbcdda229105be76ece4001819acR200. Here we don't to ignore names and metadata.
|
|
||
| ColumnarBatch data = filteredBatch.getData(); | ||
| if (!data.getSchema().equals(tableSchema)) { | ||
| if (!data.getSchema().equivalentIgnoreCollations(tableSchema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a similar comment on the other PR, but how do we know when/where we should use this instead? How can we be sure all the instances of comparison we do throughout the code-base won't now be an issue? concerned there might be somewhere that would fail only if a test had collations in the schema, which the majority of existing tests won't
It seems like we need to audit everywhere we might compare data types or schemas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried we do this type of comparison all over..
c165c05 to
bb11dc1
Compare
bb11dc1 to
114903b
Compare
Which Delta project/connector is this regarding?
Description
Added e2e tests for writing tables with collated columns. To enable writing collated data, the schema comparison had to be modified in some places.
How was this patch tested?
New tests.
Does this PR introduce any user-facing changes?
Yes, users can now create and write to Delta tables with collated types.